This study is about the H-1B visa petitions filed by employers during Fiscal Year 2024.This project mainly performs the descriptive statistics, exploratory data analysis, and statistical tests. The main focus will be on the areas of finding the Top statistics related to geographic locations, industries and validating the hypothesis of whether there is any relation between the company filings count and approvals. The key findings I found are primarily NACIS codes which is related to Information Technology and software companies ranks the highest in offering the H1B sponsored Job opportunities. Further analysis showed a high concentration of petitioning employers in specific states, like California, Texas, New Jersey, New York, and Virginia. A Chi-squared test demonstrated a statistically significant association between the volume of initial petitions filed by an employer and their approval rate, with employers filing more petitions (‘Large Filers’) experiencing higher approval rates than those filing fewer (‘Small Filers’). Overall, the FY2024 data suggests that H-1B sponsorship is concentrated geographically and by industry, and that higher petition volume correlates with higher initial success rates.
The H-1B visa program plays a significant role in the United States economy, it helps employers to temporarily employ foreign workers in specialized occupations. For many international students studying in the United States, securing post-graduation employment often involves the complex H-1B visa process, frequently as a transition from Optional Practical Training (OPT) or Curricular Practical Training (CPT). Understanding the trends of H-1B sponsorship is therefore vital for career planning and decision-making. This analysis mainly looks at identifying companies and industries most actively sponsoring H-1B visas in FY2024, finding the top geographic locations where H-1B opportunities are concentrated. Investigating this is crucial for international students because it helps identify potential employers willing to sponsor, highlights promising geographic areas for job searching, provides realistic salary expectations.
To explore this, I taken the Fiscal year 2024 data set from the publicly available dataset at the H1B Employer Datahub
This data set contains the approval and denial records of H-1B visa applications. The counts of initial approval, initial denial, continuing approval, and continuing denial are aggregated by distinct completion fiscal year, two digit NAICS code, tax ID, state, city, and ZIP code.
More Details at: Understanding Our H-1B Employer Data Hub
Below represents the glimpse of raw data taken from the H1B Employer Hub
## Rows: 61,456
## Columns: 12
## $ Line.by.line <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9"…
## $ Fiscal.Year <int> 2024, 2024, 2024, 2024, 2024, 2024, 2024, 2…
## $ Employer..Petitioner..Name <chr> "", "", "1 800 FLOWERS COM INC", "1 HOTEL K…
## $ Tax.ID <int> 3581, 4245, 7311, 5669, 7999, 6496, 3805, 1…
## $ Industry..NAICS..Code <chr> "54 - Professional, Scientific, and Technic…
## $ Petitioner.City <chr> "LAFAYETTE", "DAVIE", "JERICHO", "MIAMI", "…
## $ Petitioner.State <chr> "CA", "FL", "NY", "FL", "OR", "SC", "CA", "…
## $ Petitioner.Zip.Code <int> 94549, 33328, 11753, 33133, 97367, 29708, 9…
## $ Initial.Approval <chr> "0", "1", "1", "0", "1", "1", "1", "0", "0"…
## $ Initial.Denial <int> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0…
## $ Continuing.Approval <chr> "1", "0", "5", "1", "0", "1", "0", "1", "3"…
## $ Continuing.Denial <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
I have performed following operations to clean the data and make it ready for the further analysis.
I simplified and renamed the columns to make them clearer using select().
I also replaced empty strings with NA and changed any NA values in the approval/denial columns to 0 using na_if() and coalesce().
I removed rows with missing Tax_ID or Employer_Name, and also removed rows where all the approval/denial values were zero.
I added new columns for the total H1Bs, new H1Bs, and renewal H1Bs using mutate().
| variable | n | min | max | median | q1 | q3 | iqr | mad | mean | sd | se | ci |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Initial_Approval | 52775 | 0 | 918 | 1 | 0 | 1 | 1 | 1.483 | 2.406 | 14.820 | 0.065 | 0.126 |
| Initial_Denial | 52775 | 0 | 87 | 0 | 0 | 0 | 0 | 0.000 | 0.068 | 0.643 | 0.003 | 0.005 |
| Continuing_Approval | 52775 | 0 | 977 | 1 | 0 | 2 | 2 | 1.483 | 3.680 | 20.008 | 0.087 | 0.171 |
| Continuing_Denial | 52775 | 0 | 179 | 0 | 0 | 0 | 0 | 0.000 | 0.087 | 1.268 | 0.006 | 0.011 |
| Total_H1B | 52775 | 1 | 1431 | 1 | 1 | 3 | 2 | 0.000 | 6.241 | 30.795 | 0.134 | 0.263 |
| Total_Lottery_H1B | 52775 | 0 | 922 | 1 | 0 | 1 | 1 | 1.483 | 2.474 | 14.978 | 0.065 | 0.128 |
| Total_Other | 52775 | 0 | 1045 | 1 | 0 | 2 | 2 | 1.483 | 3.767 | 20.392 | 0.089 | 0.174 |
The median for most count variables is low, while the mean is significantly higher. The standard deviations and the IQR are large relative to the medians, further confirming the wide spread and skewness. So the summary statistics show H1B petition counts are strongly right-skewed with a wide spread. This means most employers file very few petitions, while a small number file extremely high volumes. These high-volume outliers appear genuine and represent significant H1B activity, so they should be retained in the analysis.
The histograms confirm the distributions are heavily right-skewed. Even on the log scale, we see most of the frequency concentrated towards the lower counts i.e In the summary if you observe for all categories the Mean is greater than Medians. Means most employers file very few H1B petitions but a smaller number of employers file a very large number of petitions, pulling the average up.The use of the log scale was essential to visualize this distribution.
The main box (IQR) for each variable is compressed near the bottom, indicating that 50% of employers fall within a narrow range of low counts. There’s a wide variation in H1B filing activity among employers. The numerous outliers represent employers with exceptionally high petition volumes compared to the typical filer. As discussed before, these are usually genuine (like large tech companies, consulting firms, universities) and represent a significant portion of the total H1B activity.
The above Map shows states with frequencies of H1B petitions
The bar chart show the states with the highest number of unique employers filing H1B petitions in FY2024. The states like California, Texas, New Jersey, New York, and Virginia appearing frequently.
This analysis reveals the specific cities where the highest number of employers filing H1B petitions are located in New York, San Francisco, Houston, etc
The top zip codes often points to specific corporate campuses, office parks, or university areas known for high concentrations of H1B employers. For example, you might see zip codes associated with major tech company headquarters or large consulting firm offices appearing frequently.
The chart show the most frequent NAICS codes associated with employers filing H1B petitions. Codes related to IT services (like Custom Computer Programming Services - often 541511, Computer Systems Design Services - 541512), Software Publishing (511210), Management Consulting (541611), and potentially higher education or research often dominate. Also surprisingly there are lot of employers(it ranks on 7th position) who doesn’t have the NAICS code or it is missing in the input raw data
For more details related to NAICS code- https://www.census.gov/naics/
I am analyzing the H1B FY2024 data to check whether there is a statistically significant relation between the size of the petitioning company and the outcome of initial H1B petitions (Approval vs. Denial).
Company size is not available in the data set so I’m assuming the total number of initial H1B petitions filed by each unique employer as an identifier to decide the company size. We will categorize employers based on whether they are above or below the median number of petitions filed and use a Chi-squared test of independence to compare approval rates between these groups.
| Var1 | Freq |
|---|---|
| Large Filer | 13004 |
| Small Filer | 39771 |
| Size_Category | Total_Initial_Approved | Total_Initial_Denied |
|---|---|---|
| Small Filer | 18307 | 621 |
| Large Filer | 108672 | 2983 |
From the median, we got to know there are 13004 Large filers(Big Companies, universities etc) and 39771 smaller filers(Start ups, Newly entered into H1B’s)
We perform the Chi-squared test to determine if there’s a statistically significant association between the company size category and the initial petition outcome.
Hypotheses:
Assumptions:
Size_Category
(Small/Large Filer) and Outcome (Approval/Denial).##
## Pearson's Chi-squared test with simulated p-value (based on 5000
## replicates)
##
## data: contingency_table
## X-squared = 22.383, df = NA, p-value = 2e-04
## Total_Initial_Approved Total_Initial_Denied
## Small Filer 18405.6 522.3996
## Large Filer 108573.4 3081.6004
##
## Assumption Check: Are all expected counts >= 5? TRUE
The Chi-squared test results based on the p-value and significance level (alpha=0.05).
Decision: Since the p-value ( 2e-04 ) is less than alpha ( 0.05 ), we reject the null hypothesis (H₀).
Conclusion:
There is a statistically significant association between the company size category (Small Filer vs. Large Filer, based on petition volume) and the initial H1B petition outcome (Approval/Denial) at the 5% significance level.
The analysis suggests that the likelihood of an initial H1B petition being approved differs significantly between companies filing fewer petitions (<= 1, ‘Small Filers’) and those filing more (‘> 1’, ‘Large Filers’).
This indicates that companies filing a larger volume of initial H1B petitions tend to have a higher approval rate compared to those filing fewer petitions.
This project analyzed H-1B visa petition data for FY 2024 to understand current trends in employer sponsorship, focusing on identifying which types of companies and locations are most active and whether company size relates to petition success.
Key Findings:
H-1B petitioning activity is more in specific states, with California, Texas, New Jersey, New York, and Virginia showing the highest number of unique employers filing petitions. Major metropolitan areas within these states are hotspots.
Industries related to Information Technology are overwhelmingly the most common sectors for employers filing H-1B petitions.
Our statistical analysis investigated whether the number of petitions a company files (used as an indicator of its size or H-1B activity level) is related to its success in getting initial petitions approved. We found Companies that filed a larger number of initial H-1B petitions (classified as ‘Large Filers’ based on filing more than the median number) generally had a higher initial approval rate compared to companies filing fewer petitions (‘Small Filers’).
This suggests that employers with higher H-1B petition volumes tend to experience greater success rates for their initial applications, although the difference in percentage points might be relatively small.
Limitations:
A key limitation was using the number of petitions filed as a proxy for actual company size. While its related, this isn’t a perfect measure. A company could file many petitions but still be relatively small, or vice-versa.
Employer Identification: Grouping data relied on employer names. Minor differences in how names were recorded could potentially split a single company into multiple entries, slightly affecting counts(EX: AMAZON DEVELOPMENT CENTER U S INC, AMAZON DATA SERVICES INC, AMAZON COM SERVICES LLC etc)
This analysis is a snapshot based on FY2024 data. It doesn’t show trends over multiple years or provide reasons why petitions were approved or denied.
Next Steps:
If possible, adding the company employee counts could provide a more accurate comparison based on company size.
Repeating the size vs. approval rate analysis within specific dominant industries (like IT) or states (like California) could reveal if this trend holds true across different segments.
Investigating if a similar relationship exists between company size/petition volume and the success rate of continuing (renewal) petitions.
using FY2024 as sample, we can do similar analysis on history of all Fiscal years data to get more findings.
Geethakrishna Puligundla (GK) is graduating in May 2025 with Masters in Computer Science degree. He is a nerd guy likes to work with nuanced things. He likes to go into rabbit hole(deep research) if he feels its exciting. Interested to work on the building new products/software to make it helpful for the society and strive for making the world a better place. Currently looking for a new opportunity to work on after graduation.
His philosophy is to “Be curious! Solve hard problems”
mail: pgeethakrishna@gmail.com